2020-08-02

Onchocerciasis

  • Disease caused by a filarial nematode Onchocerca voluvlus
  • It is common in tropical and subtropical areas of Africa and some parts of South America.
  • At least 25 million people and infected globally and 110 milllion are at the risk of infection.
    Global burden and clinical manifestations

    Global burden and clinical manifestations

  • It causes debilitating illness like as onchodermatitis and blindness.

Life cycle of O. volvulus

  • It needs two different host to complete it’s life cycle: humans and black flies (Simulium sp.)
    Life cycle of _O. volvulus_

    Life cycle of O. volvulus

  • Ivermectin is the only drug used for treatment.

Modelling in onchocerciasis

  • Because of the complexity of life cycle, inability of O. volvulus to be grown in vitro, mathematical models are important to study the parasite’s biology and the disease epidemiology.
    Modellling to study the effect of frequency of ivermectin treatment on microfilarial prevalence (Hamley et al, 2020)

    Modellling to study the effect of frequency of ivermectin treatment on microfilarial prevalence (Hamley et al, 2020)

Rationale: Why geospatialy model?

  • Recent epidemiological mapping study shows prevalence across Africa is heterogeneous and patchy.
    Onchocerciasis prevalence map (Zoure et al, 2014)

    Onchocerciasis prevalence map (Zoure et al, 2014)

Rationale: Why geospatial model? (contd.)

  • Prevalence and transmission are spatially continuous process and extend beyong administrative borders.
  • Simulium vectors have specific ecological niche (need fast flowing rivers for breeding)
  • Different ecological settings affect the clinical manifestation.
  • With onchocerciasis control progessing towards elimination, geospatially explicit models are more important.

Project Aims

  • Aim 1: To develop geo-spatial modelling framework for analysis of onchocerciasis prevalence
    • Identify different types of data needed for the analysis
    • Determine different ecological, socio-demographic factors driving onchocerciasis epidemiology
  • Aim 2: To investigate methods to extract epidemiologically relevant estimates from vector and parasite genetic data
    • Determine ecological factors affecting vector and parasite population distribution
    • Infer migration pattern and dispersal of vector populations using landscape genetics analysis
  • Aim 3: Modelling different scenarios like effect of drug intervention and vector control at different geospatial scale

Expectations

  • An updated spatio-temporal prevalence map for Ethiopia and other African regions depending on data availability
  • Ecological factors driving the vector population might determine distribution of parasite population and thus, govern onchocerciasis prevalence.
  • A method to incorporate genetic data into geospatially explicit model for onchocerciasis
  • A tool to facilitate monitoring and formulating strategies for onchocerciasis elimination campaign

Project progress

Aim 1

Geospatial modelling framework for prevalence data

  • Identified sources of data needed
    • Climate and environmental data (worldclim, SEDAC, NOAA, satellite data repository)
    • Prevalence data (systematic literature search, relevant public health institutes)
    • genetic data (lab repository)
  • Two different geospatial modeling framework for prevalence data explored
    • Machine learning approach: Random forest algorithm
    • Bayesian approach: Integrated Nested Laplace Approximation (INLA)

Data sources for the prototype geospatial model

  • Ethiopian prevalence data from publicly available database (Hill et al, 2019)

Onchocerciasis prevalence data from Ethiopia used for analysis

Climate and socio-demographic covariates

Raster layer of some of the covariates masked to the border of Ethiopia

Raster layer of some of the covariates masked to the border of Ethiopia

  • A total of 33 different covariates were downloaded

Selection of covariates

  • Hierarchical clustering algorithm was used to select most representative covariates
    Dendrogram from the clustering analysis showing different cluster of covariates

    Dendrogram from the clustering analysis showing different cluster of covariates

  • List of 5, 10 and 15 cluster of covariates were generated
  • Potential influence of covariates (distance to river, rural urban extent) on onchocerciasis prevalence was also considered

Random Forest Model

  • Spatial dependency on data accounted by incorporating buffer distances to the sample locations
  • Model selection with k-cross validation approach
    Five fold cross validation for model validation and selection

    Five fold cross validation for model validation and selection

  • Root mean square error and R-squared values were calculated for each model

Random Forest Model selection

Prevalence prediction with Random Forest Model

Predicted median prevalence with Random Forest Model

Random Forest Model: Prediction error

  • Prediction error was calculated from the upper and lower limit of predicted prevalence

    The prediction error is higher in the locations where predicted prevalence is higher

Effect of covariates

  • Importance of covariates can be assessed with variable importance plot
    Variable importance plot for covariates in the random forest model

    Variable importance plot for covariates in the random forest model

Effect of covariates (contd)

  • Linear regression analysis was done to assess relationship between covariates and the predicted prevalence
    Linear regression model for covariates and the predicted prevalence

    Linear regression model for covariates and the predicted prevalence

Bayesian Approach: INLA

  • Allows to incorporate prior knowledge about the parameter in the form of probability distribution
  • number of cases (\(Y_i\)) observed out of the total number of people tested (\(N_i\)) were assumed to follow binomial distribution \[ Y_i|P(\boldsymbol{x}_i) \sim Binomial(N_i, P(\boldsymbol{x}_i)) \]
  • Log odds of prevalence was modeled as \[ logit(P(\boldsymbol{x}_i)) = \beta_0 + \mathbf{X_i}^\intercal \mathbf{\beta} + S(\boldsymbol{x}_i). \]
  • \(S(\cdot)\) is a spatial random effect with Matérn covariance function.
  • Stochastic Partial Differential Approach (SPDE) approach is used to fit a spatial model and predict variable of interest at an unsampled location
  • An approximate solution to SPDE can be found using triangulation matrices
  • Model was selected comparing Watanabe-Akaike information criteria (WAIC) and the Deviation Information Criteria (DIC)
  • Normal prior with mean and precision equal to 0 and 0.001 respectively were used for the covariates.

Prevalence prediction with INLA Model

Mean prevalence map generated from the INLA model

  • The Great Rift valley appears to be the major geographical barrier influencing onchocerciasis epidemiology

INLA Model: Prediction error

Areas with ground truth data has lesser prediction error

Effect of covariates

Posterior probability distribution of effect parameter of covariates

Posterior probability distribution of effect parameter of covariates

Posterior probability distribution of effect parameter of covariates

  • Correlation between the predicted and observed was better for Random forest model (97%) compared to the INLA model (89%)
Scatter plot for the observed and predicted prevalence for the Random forest and the INLA model

Scatter plot for the observed and predicted prevalence for the Random forest and the INLA model

Next steps

  • Collate prevalence data at a greater spatial and temporal coverage
  • Prepare additional covariates reflecting information about river flow, temporal covariates on climate and socio-demographic data
  • Estimating epidemiologically relevant parameters from parasite genetic data with landscape genetic analysis
    • create a connectivity and resistance surface map which might provide insight about the migration patterns and dispersal of vector populations
    • identify environmental factors affecting their population structure of vectors and parasites
  • Expanding the current empirical geospatial model to a dynamic model which will provide greater flexibility to model different intervention scenarios

Gantt chart

Timeline for the project

Timeline for the project

Acknowledgement

  • Assoc. Prof. Warwick Grant
  • Dr. Shannon Hedtke
  • Dr. Karen McCulloch
  • Dr. Joel Miller
  • Dr. Rebecca Chisholm
  • The Grant Lab members

References

Thank you

Any questions?